Learning spoken document similarity and recommendation using supervised probabilistic latent semantic analysis

نویسندگان

  • Kishan Thambiratnam
  • Frank Seide
چکیده

This paper presents a model-based approach to spoken document similarity called Supervised Probabilistic Latent Semantic Analysis (PLSA). The method differs from traditional spoken document similarity techniques in that it allows similarity to be learned rather than approximated. The ability to learn similarity is desirable in applications such as Internet video recommendation, in which complex relationships like userpreference or speaking style need to be predicted. The proposed method exploits prior knowledge of document relationships to learn similarity. Experiments on broadcast news and Internet video corpora yielded 16.2% and 9.7% absolute mAP gains over traditional PLSA. Additionally, a cascaded Supervised+Discriminative PLSA system achieved a 3.0% absolute mAP gain over a Discriminative PLSA system, demonstrating the complementary nature of Supervised and Discriminative PLSA training.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Spoken Lecture Summarization by Random Walk over a Graph Constructed with Automatically Extracted Key Terms

This paper proposes an improved approach for spoken lecture summarization, in which random walk is performed on a graph constructed with automatically extracted key terms and probabilistic latent semantic analysis (PLSA). Each sentence of the document is represented as a node of the graph and the edge between two nodes is weighted by the topical similarity between the two sentences. The basic i...

متن کامل

Hierarchical topic organization and visual presentation of spoken documents using probabilistic latent semantic analysis (PLSA) for efficient retrieval/browsing applications

The most attractive form of future network content will be multi-media including speech information, and such speech information usually carries the core concepts for the content. As a result, the spoken documents associated with the multi-media content very possibly can serve as the key for retrieval and browsing. This paper presents a new approach of hierarchical topic organization and visual...

متن کامل

Learning aspect models with partially labeled data

0167-8655/$ see front matter 2010 Elsevier B.V. A doi:10.1016/j.patrec.2010.09.004 ⇑ Corresponding author. Address: National Centre f okritos”, Athens, Greece. Tel.: +302106503204; fax: + E-mail address: [email protected] (A. Kri In this paper, we address the problem of learning aspect models with partially labeled data for the task of document categorization. The motivation of this w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007